Dataset statistics
| Number of variables | 15 |
|---|---|
| Number of observations | 99003 |
| Missing cells | 177 |
| Missing cells (%) | < 0.1% |
| Duplicate rows | 0 |
| Duplicate rows (%) | 0.0% |
| Total size in memory | 16.4 MiB |
| Average record size in memory | 173.8 B |
Variable types
| Numeric | 14 |
|---|---|
| Categorical | 1 |
age is highly correlated with dob_year | High correlation |
dob_year is highly correlated with age | High correlation |
friend_count is highly correlated with friendships_initiated | High correlation |
friendships_initiated is highly correlated with friend_count | High correlation |
likes is highly correlated with mobile_likes and 1 other fields | High correlation |
likes_received is highly correlated with mobile_likes_received and 1 other fields | High correlation |
mobile_likes is highly correlated with likes | High correlation |
mobile_likes_received is highly correlated with likes_received and 1 other fields | High correlation |
www_likes is highly correlated with likes | High correlation |
www_likes_received is highly correlated with likes_received and 1 other fields | High correlation |
age is highly correlated with dob_year | High correlation |
dob_year is highly correlated with age | High correlation |
friend_count is highly correlated with friendships_initiated and 3 other fields | High correlation |
friendships_initiated is highly correlated with friend_count and 2 other fields | High correlation |
likes is highly correlated with likes_received and 4 other fields | High correlation |
likes_received is highly correlated with friend_count and 5 other fields | High correlation |
mobile_likes is highly correlated with likes and 3 other fields | High correlation |
mobile_likes_received is highly correlated with friend_count and 5 other fields | High correlation |
www_likes is highly correlated with likes and 1 other fields | High correlation |
www_likes_received is highly correlated with friend_count and 5 other fields | High correlation |
age is highly correlated with dob_year | High correlation |
dob_year is highly correlated with age | High correlation |
friend_count is highly correlated with friendships_initiated | High correlation |
friendships_initiated is highly correlated with friend_count | High correlation |
likes is highly correlated with likes_received and 3 other fields | High correlation |
likes_received is highly correlated with likes and 3 other fields | High correlation |
mobile_likes is highly correlated with likes and 2 other fields | High correlation |
mobile_likes_received is highly correlated with likes and 3 other fields | High correlation |
www_likes_received is highly correlated with likes and 2 other fields | High correlation |
friendships_initiated is highly correlated with friend_count | High correlation |
dob_year is highly correlated with age | High correlation |
age is highly correlated with dob_year | High correlation |
likes_received is highly correlated with mobile_likes_received and 1 other fields | High correlation |
mobile_likes_received is highly correlated with likes_received and 1 other fields | High correlation |
www_likes is highly correlated with likes | High correlation |
mobile_likes is highly correlated with likes | High correlation |
likes is highly correlated with www_likes and 1 other fields | High correlation |
friend_count is highly correlated with friendships_initiated | High correlation |
www_likes_received is highly correlated with likes_received and 1 other fields | High correlation |
likes_received is highly skewed (γ1 = 112.0745682) | Skewed |
mobile_likes_received is highly skewed (γ1 = 107.5312999) | Skewed |
www_likes_received is highly skewed (γ1 = 126.257317) | Skewed |
userid has unique values | Unique |
friend_count has 1962 (2.0%) zeros | Zeros |
friendships_initiated has 2997 (3.0%) zeros | Zeros |
likes has 22308 (22.5%) zeros | Zeros |
likes_received has 24428 (24.7%) zeros | Zeros |
mobile_likes has 35056 (35.4%) zeros | Zeros |
mobile_likes_received has 30003 (30.3%) zeros | Zeros |
www_likes has 60999 (61.6%) zeros | Zeros |
www_likes_received has 36864 (37.2%) zeros | Zeros |
Reproduction
| Analysis started | 2021-08-17 07:01:24.710419 |
|---|---|
| Analysis finished | 2021-08-17 07:02:10.106997 |
| Duration | 45.4 seconds |
| Software version | pandas-profiling v3.0.0 |
| Download configuration | config.json |
| Distinct | 99003 |
|---|---|
| Distinct (%) | 100.0% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 1597045.208 |
| Minimum | 1000008 |
|---|---|
| Maximum | 2193542 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 773.6 KiB |
Quantile statistics
| Minimum | 1000008 |
|---|---|
| 5-th percentile | 1060618.3 |
| Q1 | 1298805.5 |
| median | 1596148 |
| Q3 | 1895744 |
| 95-th percentile | 2133357.1 |
| Maximum | 2193542 |
| Range | 1193534 |
| Interquartile range (IQR) | 596938.5 |
Descriptive statistics
| Standard deviation | 344059.1775 |
|---|---|
| Coefficient of variation (CV) | 0.2154348391 |
| Kurtosis | -1.199556831 |
| Mean | 1597045.208 |
| Median Absolute Deviation (MAD) | 298438 |
| Skewness | 0.0001076605667 |
| Sum | 1.581122667 × 1011 |
| Variance | 1.183767176 × 1011 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 1048576 | 1 | < 0.1% |
| 1025225 | 1 | < 0.1% |
| 1573073 | 1 | < 0.1% |
| 1498737 | 1 | < 0.1% |
| 1850989 | 1 | < 0.1% |
| 1441935 | 1 | < 0.1% |
| 1427355 | 1 | < 0.1% |
| 1468006 | 1 | < 0.1% |
| 1605221 | 1 | < 0.1% |
| 2117085 | 1 | < 0.1% |
| Other values (98993) | 98993 |
| Value | Count | Frequency (%) |
| 1000008 | 1 | |
| 1000013 | 1 | |
| 1000015 | 1 | |
| 1000038 | 1 | |
| 1000059 | 1 | |
| 1000061 | 1 | |
| 1000068 | 1 | |
| 1000094 | 1 | |
| 1000103 | 1 | |
| 1000125 | 1 |
| Value | Count | Frequency (%) |
| 2193542 | 1 | |
| 2193538 | 1 | |
| 2193522 | 1 | |
| 2193499 | 1 | |
| 2193485 | 1 | |
| 2193473 | 1 | |
| 2193468 | 1 | |
| 2193465 | 1 | |
| 2193460 | 1 | |
| 2193418 | 1 |
| Distinct | 101 |
|---|---|
| Distinct (%) | 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 37.28022383 |
| Minimum | 13 |
|---|---|
| Maximum | 113 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 773.6 KiB |
Quantile statistics
| Minimum | 13 |
|---|---|
| 5-th percentile | 15 |
| Q1 | 20 |
| median | 28 |
| Q3 | 50 |
| 95-th percentile | 90 |
| Maximum | 113 |
| Range | 100 |
| Interquartile range (IQR) | 30 |
Descriptive statistics
| Standard deviation | 22.58974831 |
|---|---|
| Coefficient of variation (CV) | 0.6059445462 |
| Kurtosis | 1.561446767 |
| Mean | 37.28022383 |
| Median Absolute Deviation (MAD) | 10 |
| Skewness | 1.415260654 |
| Sum | 3690854 |
| Variance | 510.2967289 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 18 | 5196 | 5.2% |
| 23 | 4404 | 4.4% |
| 19 | 4391 | 4.4% |
| 20 | 3769 | 3.8% |
| 21 | 3671 | 3.7% |
| 25 | 3641 | 3.7% |
| 17 | 3283 | 3.3% |
| 16 | 3086 | 3.1% |
| 22 | 3032 | 3.1% |
| 24 | 2827 | 2.9% |
| Other values (91) | 61703 |
| Value | Count | Frequency (%) |
| 13 | 484 | 0.5% |
| 14 | 1925 | 1.9% |
| 15 | 2618 | |
| 16 | 3086 | |
| 17 | 3283 | |
| 18 | 5196 | |
| 19 | 4391 | |
| 20 | 3769 | |
| 21 | 3671 | |
| 22 | 3032 |
| Value | Count | Frequency (%) |
| 113 | 202 | 0.2% |
| 112 | 18 | < 0.1% |
| 111 | 18 | < 0.1% |
| 110 | 15 | < 0.1% |
| 109 | 9 | < 0.1% |
| 108 | 1661 | |
| 107 | 98 | 0.1% |
| 106 | 125 | 0.1% |
| 105 | 80 | 0.1% |
| 104 | 73 | 0.1% |
dob_day
Real number (ℝ≥0)
| Distinct | 31 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 14.53040817 |
| Minimum | 1 |
|---|---|
| Maximum | 31 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 773.6 KiB |
Quantile statistics
| Minimum | 1 |
|---|---|
| 5-th percentile | 1 |
| Q1 | 7 |
| median | 14 |
| Q3 | 22 |
| 95-th percentile | 29 |
| Maximum | 31 |
| Range | 30 |
| Interquartile range (IQR) | 15 |
Descriptive statistics
| Standard deviation | 9.015606359 |
|---|---|
| Coefficient of variation (CV) | 0.6204647697 |
| Kurtosis | -1.188960111 |
| Mean | 14.53040817 |
| Median Absolute Deviation (MAD) | 8 |
| Skewness | 0.1078407568 |
| Sum | 1438554 |
| Variance | 81.28115802 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 1 | 7900 | 8.0% |
| 10 | 4030 | 4.1% |
| 15 | 3555 | 3.6% |
| 5 | 3545 | 3.6% |
| 12 | 3413 | 3.4% |
| 2 | 3409 | 3.4% |
| 3 | 3291 | 3.3% |
| 17 | 3266 | 3.3% |
| 20 | 3263 | 3.3% |
| 14 | 3219 | 3.3% |
| Other values (21) | 60112 |
| Value | Count | Frequency (%) |
| 1 | 7900 | |
| 2 | 3409 | |
| 3 | 3291 | |
| 4 | 3217 | |
| 5 | 3545 | |
| 6 | 3108 | 3.1% |
| 7 | 3010 | 3.0% |
| 8 | 3202 | |
| 9 | 3003 | 3.0% |
| 10 | 4030 |
| Value | Count | Frequency (%) |
| 31 | 1507 | |
| 30 | 2530 | |
| 29 | 2508 | |
| 28 | 2955 | |
| 27 | 2755 | |
| 26 | 2753 | |
| 25 | 3217 | |
| 24 | 2807 | |
| 23 | 2864 | |
| 22 | 2838 |
| Distinct | 101 |
|---|---|
| Distinct (%) | 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 1975.719776 |
| Minimum | 1900 |
|---|---|
| Maximum | 2000 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 773.6 KiB |
Quantile statistics
| Minimum | 1900 |
|---|---|
| 5-th percentile | 1923 |
| Q1 | 1963 |
| median | 1985 |
| Q3 | 1993 |
| 95-th percentile | 1998 |
| Maximum | 2000 |
| Range | 100 |
| Interquartile range (IQR) | 30 |
Descriptive statistics
| Standard deviation | 22.58974831 |
|---|---|
| Coefficient of variation (CV) | 0.01143368032 |
| Kurtosis | 1.561446767 |
| Mean | 1975.719776 |
| Median Absolute Deviation (MAD) | 10 |
| Skewness | -1.415260654 |
| Sum | 195602185 |
| Variance | 510.2967289 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 1995 | 5196 | 5.2% |
| 1990 | 4404 | 4.4% |
| 1994 | 4391 | 4.4% |
| 1993 | 3769 | 3.8% |
| 1992 | 3671 | 3.7% |
| 1988 | 3641 | 3.7% |
| 1996 | 3283 | 3.3% |
| 1997 | 3086 | 3.1% |
| 1991 | 3032 | 3.1% |
| 1989 | 2827 | 2.9% |
| Other values (91) | 61703 |
| Value | Count | Frequency (%) |
| 1900 | 202 | 0.2% |
| 1901 | 18 | < 0.1% |
| 1902 | 18 | < 0.1% |
| 1903 | 15 | < 0.1% |
| 1904 | 9 | < 0.1% |
| 1905 | 1661 | |
| 1906 | 98 | 0.1% |
| 1907 | 125 | 0.1% |
| 1908 | 80 | 0.1% |
| 1909 | 73 | 0.1% |
| Value | Count | Frequency (%) |
| 2000 | 484 | 0.5% |
| 1999 | 1925 | 1.9% |
| 1998 | 2618 | |
| 1997 | 3086 | |
| 1996 | 3283 | |
| 1995 | 5196 | |
| 1994 | 4391 | |
| 1993 | 3769 | |
| 1992 | 3671 | |
| 1991 | 3032 |
dob_month
Real number (ℝ≥0)
| Distinct | 12 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 6.283365151 |
| Minimum | 1 |
|---|---|
| Maximum | 12 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 773.6 KiB |
Quantile statistics
| Minimum | 1 |
|---|---|
| 5-th percentile | 1 |
| Q1 | 3 |
| median | 6 |
| Q3 | 9 |
| 95-th percentile | 12 |
| Maximum | 12 |
| Range | 11 |
| Interquartile range (IQR) | 6 |
Descriptive statistics
| Standard deviation | 3.529671569 |
|---|---|
| Coefficient of variation (CV) | 0.5617485987 |
| Kurtosis | -1.240397572 |
| Mean | 6.283365151 |
| Median Absolute Deviation (MAD) | 3 |
| Skewness | 0.03129550742 |
| Sum | 622072 |
| Variance | 12.45858138 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 1 | 11772 | |
| 10 | 8476 | |
| 5 | 8271 | |
| 8 | 8266 | |
| 3 | 8110 | |
| 7 | 8021 | |
| 9 | 7939 | |
| 12 | 7894 | |
| 4 | 7810 | |
| 2 | 7632 | |
| Other values (2) | 14812 |
| Value | Count | Frequency (%) |
| 1 | 11772 | |
| 2 | 7632 | |
| 3 | 8110 | |
| 4 | 7810 | |
| 5 | 8271 | |
| 6 | 7607 | |
| 7 | 8021 | |
| 8 | 8266 | |
| 9 | 7939 | |
| 10 | 8476 |
| Value | Count | Frequency (%) |
| 12 | 7894 | |
| 11 | 7205 | |
| 10 | 8476 | |
| 9 | 7939 | |
| 8 | 8266 | |
| 7 | 8021 | |
| 6 | 7607 | |
| 5 | 8271 | |
| 4 | 7810 | |
| 3 | 8110 |
gender
Categorical
| Distinct | 2 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 175 |
| Missing (%) | 0.2% |
| Memory size | 5.8 MiB |
| male | |
|---|---|
| female |
Length
| Max length | 6 |
|---|---|
| Median length | 4 |
| Mean length | 4.814627434 |
| Min length | 4 |
Characters and Unicode
| Total characters | 475820 |
|---|---|
| Distinct characters | 5 |
| Distinct categories | 1 ? |
| Distinct scripts | 1 ? |
| Distinct blocks | 1 ? |
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Sample
| 1st row | male |
|---|---|
| 2nd row | female |
| 3rd row | male |
| 4th row | female |
| 5th row | male |
Common Values
| Value | Count | Frequency (%) |
| male | 58574 | |
| female | 40254 | |
| (Missing) | 175 | 0.2% |
Length
Pie chart
| Value | Count | Frequency (%) |
| male | 58574 | |
| female | 40254 |
Most occurring characters
| Value | Count | Frequency (%) |
| e | 139082 | |
| m | 98828 | |
| a | 98828 | |
| l | 98828 | |
| f | 40254 | 8.5% |
Most occurring categories
| Value | Count | Frequency (%) |
| Lowercase Letter | 475820 |
Most frequent character per category
Lowercase Letter
| Value | Count | Frequency (%) |
| e | 139082 | |
| m | 98828 | |
| a | 98828 | |
| l | 98828 | |
| f | 40254 | 8.5% |
Most occurring scripts
| Value | Count | Frequency (%) |
| Latin | 475820 |
Most frequent character per script
Latin
| Value | Count | Frequency (%) |
| e | 139082 | |
| m | 98828 | |
| a | 98828 | |
| l | 98828 | |
| f | 40254 | 8.5% |
Most occurring blocks
| Value | Count | Frequency (%) |
| ASCII | 475820 |
Most frequent character per block
ASCII
| Value | Count | Frequency (%) |
| e | 139082 | |
| m | 98828 | |
| a | 98828 | |
| l | 98828 | |
| f | 40254 | 8.5% |
tenure
Real number (ℝ≥0)
| Distinct | 2426 |
|---|---|
| Distinct (%) | 2.5% |
| Missing | 2 |
| Missing (%) | < 0.1% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 537.8873749 |
| Minimum | 0 |
|---|---|
| Maximum | 3139 |
| Zeros | 70 |
| Zeros (%) | 0.1% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 773.6 KiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 47 |
| Q1 | 226 |
| median | 412 |
| Q3 | 675 |
| 95-th percentile | 1575 |
| Maximum | 3139 |
| Range | 3139 |
| Interquartile range (IQR) | 449 |
Descriptive statistics
| Standard deviation | 457.6498739 |
|---|---|
| Coefficient of variation (CV) | 0.8508284359 |
| Kurtosis | 2.199058275 |
| Mean | 537.8873749 |
| Median Absolute Deviation (MAD) | 213 |
| Skewness | 1.535680925 |
| Sum | 53251388 |
| Variance | 209443.4071 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 300 | 173 | 0.2% |
| 303 | 170 | 0.2% |
| 242 | 164 | 0.2% |
| 272 | 163 | 0.2% |
| 257 | 161 | 0.2% |
| 297 | 161 | 0.2% |
| 280 | 160 | 0.2% |
| 285 | 160 | 0.2% |
| 284 | 158 | 0.2% |
| 278 | 158 | 0.2% |
| Other values (2416) | 97373 |
| Value | Count | Frequency (%) |
| 0 | 70 | |
| 1 | 60 | |
| 2 | 72 | |
| 3 | 79 | |
| 4 | 86 | |
| 5 | 92 | |
| 6 | 93 | |
| 7 | 84 | |
| 8 | 87 | |
| 9 | 93 |
| Value | Count | Frequency (%) |
| 3139 | 3 | |
| 3129 | 1 | < 0.1% |
| 3128 | 1 | < 0.1% |
| 3101 | 1 | < 0.1% |
| 3019 | 1 | < 0.1% |
| 2958 | 1 | < 0.1% |
| 2926 | 1 | < 0.1% |
| 2888 | 1 | < 0.1% |
| 2822 | 1 | < 0.1% |
| 2788 | 1 | < 0.1% |
friend_count
Real number (ℝ≥0)
HIGH CORRELATIONHIGH CORRELATIONHIGH CORRELATIONHIGH CORRELATIONZEROS| Distinct | 2562 |
|---|---|
| Distinct (%) | 2.6% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 196.3507873 |
| Minimum | 0 |
|---|---|
| Maximum | 4923 |
| Zeros | 1962 |
| Zeros (%) | 2.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 773.6 KiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 3 |
| Q1 | 31 |
| median | 82 |
| Q3 | 206 |
| 95-th percentile | 720 |
| Maximum | 4923 |
| Range | 4923 |
| Interquartile range (IQR) | 175 |
Descriptive statistics
| Standard deviation | 387.304229 |
|---|---|
| Coefficient of variation (CV) | 1.972511719 |
| Kurtosis | 50.09427289 |
| Mean | 196.3507873 |
| Median Absolute Deviation (MAD) | 64 |
| Skewness | 6.059008484 |
| Sum | 19439317 |
| Variance | 150004.5658 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 0 | 1962 | 2.0% |
| 1 | 1816 | 1.8% |
| 2 | 1117 | 1.1% |
| 3 | 860 | 0.9% |
| 5 | 789 | 0.8% |
| 4 | 749 | 0.8% |
| 10 | 737 | 0.7% |
| 24 | 732 | 0.7% |
| 6 | 720 | 0.7% |
| 29 | 719 | 0.7% |
| Other values (2552) | 88802 |
| Value | Count | Frequency (%) |
| 0 | 1962 | |
| 1 | 1816 | |
| 2 | 1117 | |
| 3 | 860 | |
| 4 | 749 | 0.8% |
| 5 | 789 | |
| 6 | 720 | 0.7% |
| 7 | 671 | 0.7% |
| 8 | 718 | 0.7% |
| 9 | 700 | 0.7% |
| Value | Count | Frequency (%) |
| 4923 | 1 | |
| 4917 | 1 | |
| 4863 | 1 | |
| 4845 | 1 | |
| 4844 | 1 | |
| 4826 | 1 | |
| 4817 | 1 | |
| 4803 | 1 | |
| 4797 | 1 | |
| 4794 | 1 |
friendships_initiated
Real number (ℝ≥0)
HIGH CORRELATIONHIGH CORRELATIONHIGH CORRELATIONHIGH CORRELATIONZEROS| Distinct | 1519 |
|---|---|
| Distinct (%) | 1.5% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 107.4524711 |
| Minimum | 0 |
|---|---|
| Maximum | 4144 |
| Zeros | 2997 |
| Zeros (%) | 3.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 773.6 KiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 1 |
| Q1 | 17 |
| median | 46 |
| Q3 | 117 |
| 95-th percentile | 418 |
| Maximum | 4144 |
| Range | 4144 |
| Interquartile range (IQR) | 100 |
Descriptive statistics
| Standard deviation | 188.786951 |
|---|---|
| Coefficient of variation (CV) | 1.756934475 |
| Kurtosis | 42.53560096 |
| Mean | 107.4524711 |
| Median Absolute Deviation (MAD) | 36 |
| Skewness | 5.150757415 |
| Sum | 10638117 |
| Variance | 35640.51287 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 0 | 2997 | 3.0% |
| 1 | 2212 | 2.2% |
| 2 | 1551 | 1.6% |
| 3 | 1355 | 1.4% |
| 4 | 1352 | 1.4% |
| 6 | 1328 | 1.3% |
| 5 | 1328 | 1.3% |
| 11 | 1319 | 1.3% |
| 8 | 1314 | 1.3% |
| 13 | 1279 | 1.3% |
| Other values (1509) | 82968 |
| Value | Count | Frequency (%) |
| 0 | 2997 | |
| 1 | 2212 | |
| 2 | 1551 | |
| 3 | 1355 | |
| 4 | 1352 | |
| 5 | 1328 | |
| 6 | 1328 | |
| 7 | 1237 | |
| 8 | 1314 | |
| 9 | 1245 |
| Value | Count | Frequency (%) |
| 4144 | 1 | |
| 3654 | 1 | |
| 3594 | 1 | |
| 3538 | 1 | |
| 3415 | 1 | |
| 3238 | 1 | |
| 3233 | 1 | |
| 3086 | 1 | |
| 3078 | 1 | |
| 3024 | 1 |
| Distinct | 2924 |
|---|---|
| Distinct (%) | 3.0% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 156.0787855 |
| Minimum | 0 |
|---|---|
| Maximum | 25111 |
| Zeros | 22308 |
| Zeros (%) | 22.5% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 773.6 KiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 0 |
| Q1 | 1 |
| median | 11 |
| Q3 | 81 |
| 95-th percentile | 726 |
| Maximum | 25111 |
| Range | 25111 |
| Interquartile range (IQR) | 80 |
Descriptive statistics
| Standard deviation | 572.2806808 |
|---|---|
| Coefficient of variation (CV) | 3.666614134 |
| Kurtosis | 200.4456878 |
| Mean | 156.0787855 |
| Median Absolute Deviation (MAD) | 11 |
| Skewness | 11.02370356 |
| Sum | 15452268 |
| Variance | 327505.1777 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 0 | 22308 | |
| 1 | 6928 | 7.0% |
| 2 | 4434 | 4.5% |
| 3 | 3240 | 3.3% |
| 4 | 2507 | 2.5% |
| 5 | 2027 | 2.0% |
| 6 | 1806 | 1.8% |
| 7 | 1618 | 1.6% |
| 8 | 1430 | 1.4% |
| 9 | 1381 | 1.4% |
| Other values (2914) | 51324 |
| Value | Count | Frequency (%) |
| 0 | 22308 | |
| 1 | 6928 | 7.0% |
| 2 | 4434 | 4.5% |
| 3 | 3240 | 3.3% |
| 4 | 2507 | 2.5% |
| 5 | 2027 | 2.0% |
| 6 | 1806 | 1.8% |
| 7 | 1618 | 1.6% |
| 8 | 1430 | 1.4% |
| 9 | 1381 | 1.4% |
| Value | Count | Frequency (%) |
| 25111 | 1 | |
| 21652 | 1 | |
| 16732 | 1 | |
| 16583 | 1 | |
| 14799 | 1 | |
| 14355 | 1 | |
| 14050 | 1 | |
| 14039 | 1 | |
| 13692 | 1 | |
| 13622 | 1 |
likes_received
Real number (ℝ≥0)
HIGH CORRELATIONHIGH CORRELATIONHIGH CORRELATIONHIGH CORRELATIONSKEWEDZEROS| Distinct | 2681 |
|---|---|
| Distinct (%) | 2.7% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 142.6893629 |
| Minimum | 0 |
|---|---|
| Maximum | 261197 |
| Zeros | 24428 |
| Zeros (%) | 24.7% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 773.6 KiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 0 |
| Q1 | 1 |
| median | 8 |
| Q3 | 59 |
| 95-th percentile | 561 |
| Maximum | 261197 |
| Range | 261197 |
| Interquartile range (IQR) | 58 |
Descriptive statistics
| Standard deviation | 1387.919613 |
|---|---|
| Coefficient of variation (CV) | 9.726861091 |
| Kurtosis | 17384.94 |
| Mean | 142.6893629 |
| Median Absolute Deviation (MAD) | 8 |
| Skewness | 112.0745682 |
| Sum | 14126675 |
| Variance | 1926320.851 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 0 | 24428 | |
| 1 | 7305 | 7.4% |
| 2 | 4541 | 4.6% |
| 3 | 3347 | 3.4% |
| 4 | 2669 | 2.7% |
| 5 | 2373 | 2.4% |
| 6 | 1873 | 1.9% |
| 7 | 1680 | 1.7% |
| 8 | 1538 | 1.6% |
| 9 | 1351 | 1.4% |
| Other values (2671) | 47898 |
| Value | Count | Frequency (%) |
| 0 | 24428 | |
| 1 | 7305 | 7.4% |
| 2 | 4541 | 4.6% |
| 3 | 3347 | 3.4% |
| 4 | 2669 | 2.7% |
| 5 | 2373 | 2.4% |
| 6 | 1873 | 1.9% |
| 7 | 1680 | 1.7% |
| 8 | 1538 | 1.6% |
| 9 | 1351 | 1.4% |
| Value | Count | Frequency (%) |
| 261197 | 1 | |
| 178166 | 1 | |
| 152014 | 1 | |
| 106025 | 1 | |
| 82623 | 1 | |
| 53534 | 1 | |
| 52964 | 1 | |
| 45633 | 1 | |
| 42449 | 1 | |
| 39536 | 1 |
mobile_likes
Real number (ℝ≥0)
HIGH CORRELATIONHIGH CORRELATIONHIGH CORRELATIONHIGH CORRELATIONZEROS| Distinct | 2396 |
|---|---|
| Distinct (%) | 2.4% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 106.1162995 |
| Minimum | 0 |
|---|---|
| Maximum | 25111 |
| Zeros | 35056 |
| Zeros (%) | 35.4% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 773.6 KiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 0 |
| Q1 | 0 |
| median | 4 |
| Q3 | 46 |
| 95-th percentile | 481.9 |
| Maximum | 25111 |
| Range | 25111 |
| Interquartile range (IQR) | 46 |
Descriptive statistics
| Standard deviation | 445.2529851 |
|---|---|
| Coefficient of variation (CV) | 4.195896268 |
| Kurtosis | 360.9885806 |
| Mean | 106.1162995 |
| Median Absolute Deviation (MAD) | 4 |
| Skewness | 14.16123656 |
| Sum | 10505832 |
| Variance | 198250.2207 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 0 | 35056 | |
| 1 | 6297 | 6.4% |
| 2 | 3941 | 4.0% |
| 3 | 2917 | 2.9% |
| 4 | 2265 | 2.3% |
| 5 | 1794 | 1.8% |
| 6 | 1598 | 1.6% |
| 7 | 1395 | 1.4% |
| 8 | 1212 | 1.2% |
| 9 | 1149 | 1.2% |
| Other values (2386) | 41379 |
| Value | Count | Frequency (%) |
| 0 | 35056 | |
| 1 | 6297 | 6.4% |
| 2 | 3941 | 4.0% |
| 3 | 2917 | 2.9% |
| 4 | 2265 | 2.3% |
| 5 | 1794 | 1.8% |
| 6 | 1598 | 1.6% |
| 7 | 1395 | 1.4% |
| 8 | 1212 | 1.2% |
| 9 | 1149 | 1.2% |
| Value | Count | Frequency (%) |
| 25111 | 1 | |
| 21652 | 1 | |
| 16732 | 1 | |
| 14039 | 1 | |
| 13529 | 1 | |
| 12934 | 1 | |
| 12639 | 1 | |
| 12104 | 1 | |
| 12083 | 1 | |
| 11959 | 1 |
mobile_likes_received
Real number (ℝ≥0)
HIGH CORRELATIONHIGH CORRELATIONHIGH CORRELATIONHIGH CORRELATIONSKEWEDZEROS| Distinct | 2004 |
|---|---|
| Distinct (%) | 2.0% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 84.1204913 |
| Minimum | 0 |
|---|---|
| Maximum | 138561 |
| Zeros | 30003 |
| Zeros (%) | 30.3% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 773.6 KiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 0 |
| Q1 | 0 |
| median | 4 |
| Q3 | 33 |
| 95-th percentile | 317 |
| Maximum | 138561 |
| Range | 138561 |
| Interquartile range (IQR) | 33 |
Descriptive statistics
| Standard deviation | 839.8894437 |
|---|---|
| Coefficient of variation (CV) | 9.984362083 |
| Kurtosis | 15522.64932 |
| Mean | 84.1204913 |
| Median Absolute Deviation (MAD) | 4 |
| Skewness | 107.5312999 |
| Sum | 8328181 |
| Variance | 705414.2777 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 0 | 30003 | |
| 1 | 8243 | 8.3% |
| 2 | 4948 | 5.0% |
| 3 | 3608 | 3.6% |
| 4 | 2944 | 3.0% |
| 5 | 2383 | 2.4% |
| 6 | 2022 | 2.0% |
| 7 | 1745 | 1.8% |
| 8 | 1521 | 1.5% |
| 9 | 1437 | 1.5% |
| Other values (1994) | 40149 |
| Value | Count | Frequency (%) |
| 0 | 30003 | |
| 1 | 8243 | 8.3% |
| 2 | 4948 | 5.0% |
| 3 | 3608 | 3.6% |
| 4 | 2944 | 3.0% |
| 5 | 2383 | 2.4% |
| 6 | 2022 | 2.0% |
| 7 | 1745 | 1.8% |
| 8 | 1521 | 1.5% |
| 9 | 1437 | 1.5% |
| Value | Count | Frequency (%) |
| 138561 | 1 | |
| 131244 | 1 | |
| 89911 | 1 | |
| 73333 | 1 | |
| 43410 | 1 | |
| 30754 | 1 | |
| 30387 | 1 | |
| 27353 | 1 | |
| 20770 | 1 | |
| 18925 | 1 |
| Distinct | 1726 |
|---|---|
| Distinct (%) | 1.7% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 49.96242538 |
| Minimum | 0 |
|---|---|
| Maximum | 14865 |
| Zeros | 60999 |
| Zeros (%) | 61.6% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 773.6 KiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 0 |
| Q1 | 0 |
| median | 0 |
| Q3 | 7 |
| 95-th percentile | 208 |
| Maximum | 14865 |
| Range | 14865 |
| Interquartile range (IQR) | 7 |
Descriptive statistics
| Standard deviation | 285.5601519 |
|---|---|
| Coefficient of variation (CV) | 5.715498191 |
| Kurtosis | 449.1484832 |
| Mean | 49.96242538 |
| Median Absolute Deviation (MAD) | 0 |
| Skewness | 16.91102529 |
| Sum | 4946430 |
| Variance | 81544.60033 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 0 | 60999 | |
| 1 | 4697 | 4.7% |
| 2 | 2760 | 2.8% |
| 3 | 1948 | 2.0% |
| 4 | 1419 | 1.4% |
| 5 | 1202 | 1.2% |
| 6 | 1081 | 1.1% |
| 7 | 897 | 0.9% |
| 8 | 792 | 0.8% |
| 9 | 757 | 0.8% |
| Other values (1716) | 22451 | 22.7% |
| Value | Count | Frequency (%) |
| 0 | 60999 | |
| 1 | 4697 | 4.7% |
| 2 | 2760 | 2.8% |
| 3 | 1948 | 2.0% |
| 4 | 1419 | 1.4% |
| 5 | 1202 | 1.2% |
| 6 | 1081 | 1.1% |
| 7 | 897 | 0.9% |
| 8 | 792 | 0.8% |
| 9 | 757 | 0.8% |
| Value | Count | Frequency (%) |
| 14865 | 1 | |
| 12903 | 1 | |
| 11077 | 1 | |
| 10763 | 1 | |
| 10627 | 1 | |
| 10539 | 1 | |
| 10255 | 1 | |
| 10232 | 1 | |
| 9902 | 1 | |
| 9431 | 1 |
www_likes_received
Real number (ℝ≥0)
HIGH CORRELATIONHIGH CORRELATIONHIGH CORRELATIONHIGH CORRELATIONSKEWEDZEROS| Distinct | 1636 |
|---|---|
| Distinct (%) | 1.7% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 58.56883125 |
| Minimum | 0 |
|---|---|
| Maximum | 129953 |
| Zeros | 36864 |
| Zeros (%) | 37.2% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 773.6 KiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 0 |
| Q1 | 0 |
| median | 2 |
| Q3 | 20 |
| 95-th percentile | 227 |
| Maximum | 129953 |
| Range | 129953 |
| Interquartile range (IQR) | 20 |
Descriptive statistics
| Standard deviation | 601.416348 |
|---|---|
| Coefficient of variation (CV) | 10.26853934 |
| Kurtosis | 23812.2491 |
| Mean | 58.56883125 |
| Median Absolute Deviation (MAD) | 2 |
| Skewness | 126.257317 |
| Sum | 5798490 |
| Variance | 361701.6237 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 0 | 36864 | |
| 1 | 8513 | 8.6% |
| 2 | 5111 | 5.2% |
| 3 | 3586 | 3.6% |
| 4 | 2828 | 2.9% |
| 5 | 2317 | 2.3% |
| 6 | 1918 | 1.9% |
| 7 | 1602 | 1.6% |
| 8 | 1445 | 1.5% |
| 9 | 1373 | 1.4% |
| Other values (1626) | 33446 |
| Value | Count | Frequency (%) |
| 0 | 36864 | |
| 1 | 8513 | 8.6% |
| 2 | 5111 | 5.2% |
| 3 | 3586 | 3.6% |
| 4 | 2828 | 2.9% |
| 5 | 2317 | 2.3% |
| 6 | 1918 | 1.9% |
| 7 | 1602 | 1.6% |
| 8 | 1445 | 1.5% |
| 9 | 1373 | 1.4% |
| Value | Count | Frequency (%) |
| 129953 | 1 | |
| 62103 | 1 | |
| 39605 | 1 | |
| 39213 | 1 | |
| 34039 | 1 | |
| 32692 | 1 | |
| 29337 | 1 | |
| 23147 | 1 | |
| 22644 | 1 | |
| 15096 | 1 |
Pearson's r
The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.
Spearman's ρ
The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.
Kendall's τ
Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.
Phik (φk)
Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here.First rows
| userid | age | dob_day | dob_year | dob_month | gender | tenure | friend_count | friendships_initiated | likes | likes_received | mobile_likes | mobile_likes_received | www_likes | www_likes_received | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 2094382 | 14 | 19 | 1999 | 11 | male | 266.0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| 1 | 1192601 | 14 | 2 | 1999 | 11 | female | 6.0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| 2 | 2083884 | 14 | 16 | 1999 | 11 | male | 13.0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| 3 | 1203168 | 14 | 25 | 1999 | 12 | female | 93.0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| 4 | 1733186 | 14 | 4 | 1999 | 12 | male | 82.0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| 5 | 1524765 | 14 | 1 | 1999 | 12 | male | 15.0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| 6 | 1136133 | 13 | 14 | 2000 | 1 | male | 12.0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| 7 | 1680361 | 13 | 4 | 2000 | 1 | female | 0.0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| 8 | 1365174 | 13 | 1 | 2000 | 1 | male | 81.0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| 9 | 1712567 | 13 | 2 | 2000 | 2 | male | 171.0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
Last rows
| userid | age | dob_day | dob_year | dob_month | gender | tenure | friend_count | friendships_initiated | likes | likes_received | mobile_likes | mobile_likes_received | www_likes | www_likes_received | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 98993 | 1654565 | 19 | 15 | 1994 | 8 | male | 394.0 | 4538 | 4144 | 4501 | 15088 | 4435 | 5961 | 66 | 9127 |
| 98994 | 2063006 | 20 | 4 | 1993 | 1 | female | 402.0 | 1988 | 332 | 7351 | 106025 | 7248 | 73333 | 103 | 32692 |
| 98995 | 1132164 | 20 | 9 | 1993 | 10 | female | 699.0 | 3611 | 973 | 4507 | 7768 | 4414 | 6909 | 93 | 859 |
| 98996 | 1668695 | 24 | 25 | 1989 | 4 | female | 182.0 | 2938 | 1272 | 6018 | 17765 | 5843 | 11708 | 175 | 6057 |
| 98997 | 1458985 | 28 | 14 | 1985 | 12 | female | 290.0 | 2218 | 1618 | 4626 | 10268 | 4290 | 4250 | 336 | 6018 |
| 98998 | 1268299 | 68 | 4 | 1945 | 4 | female | 541.0 | 2118 | 341 | 3996 | 18089 | 3505 | 11887 | 491 | 6202 |
| 98999 | 1256153 | 18 | 12 | 1995 | 3 | female | 21.0 | 1968 | 1720 | 4401 | 13412 | 4399 | 10592 | 2 | 2820 |
| 99000 | 1195943 | 15 | 10 | 1998 | 5 | female | 111.0 | 2002 | 1524 | 11959 | 12554 | 11959 | 11462 | 0 | 1092 |
| 99001 | 1468023 | 23 | 11 | 1990 | 4 | female | 416.0 | 2560 | 185 | 4506 | 6516 | 4506 | 5760 | 0 | 756 |
| 99002 | 1397896 | 39 | 15 | 1974 | 5 | female | 397.0 | 2049 | 768 | 9410 | 12443 | 9410 | 9530 | 0 | 2913 |